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Abstract 

The Fast Fourier Transform (FFT) is the most efficiently known way to compute the Discrete Fourier 
Transform (DFT) of an arbitrary n-length signal, and has a computational complexity of 0(n log n). If 
the DFT X of the signal x has only k non-zero coefficients (where k < n), can we do better? In [1], we 
addressed this question and presented a novel FFAST (Fast Fourier Aliasing-based Sparse Transform) 
algorithm that cleverly induces sparse graph alias codes in the DFT domain, via a Chinese-Remainder- 
Theorem (CRT)-guided sub-sampling operation of the time-domain samples. The resulting sparse graph 
alias codes are then exploited to devise a fast and iterative onion-peeling style decoder that computes 
an n length DFT of a signal using only 0{k) time-domain samples and 0{klogk) computations. The 
FFAST algorithm is applicable whenever k is sub-linear in n (i.e. k = o(n)), but is obviously most 
attractive when k is much smaller than n. 

In this paper, we adapt the FFAST framework of [1] to the case where the time-domain samples are 
corrupted by a white Gaussian noise. In particular, we show that the extended noise robust algorithm R- 
FFAST computes an n-length fc-sparse DFT X using 0(fclog^ n)' noise-corrupted time-domain samples, 
in 0{k log"* n) computations, i.e., sub-linear time complexity. While our theoretical results are for signals 
with a uniformly random support of the non-zero DFT coefficients and additive white Gaussian noise, 
we provide simulation results which demonstrates that the R-FFAST algorithm performs well even for 
signals like MR images, that have an approximately sparse Fourier spectrum with a non-uniform support 
for the dominant DFT coefficients. 

'in this paper, we assume a fixed sampling structure or measurement system, that is motivated by practical constraints, 
particularly for hardware implementation. If it is feasible to have a randomized measurement system, which permits the flexibility 
of choosing a different sampling stracture for every input signal, we also have a variant of the R-FFAST algorithm presented 
in [2] which can robustly compute an n-length fc-sparse DFT using 0{k\og^^^ n) samples and 0{k\o^n) computations. 
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(a) Log intensity plot of the 2_D-DFT of (b) Original ‘Brain’ image in spatial do- (c) Reconstructed ‘Brain’ image using 
the original ‘Brain’ image. main. the R-FFAST. 


Fig. 1. Application of the 2D R-FFAST algorithm to reconstruct the ‘Brain’ image acquired on an MR scanner with dimension 
504 X 504. The 2D R-FFAST algorithm reconstructs the ‘Brain’ image, as shown in Fig. 1(c), using overall 60.18% of the 
Fourier samples of Fig. 1(a). 


I. Introduction 

The Fast Fourier Transform is the fastest known way to compute the DFT of an arbitrary n-length 
signal, and has a computational complexity of O(relogn). Many applications of interest involve signals 
that have a sparse Fourier spectrum, e.g. signals relating to audio, image, and video data, biomedical 
signals, etc. In [1], we have proposed a novel FFAST (Fast Fourier Aliasing-based Sparse Transform) 
framework, that under idealized assumptions of no measurement noise and an exactly fc-sparse signal 
spectrum (i.e. there are precisely k non-zero DFT coefficients), computes an n-length DFT X using only 
0{k) time-domain samples in 0{klogk) arithmetic computations. For large signals, i.e., when n is of 
order of millions or tens of millions, as is becoming more relevant in the Big-data age, the gains over 
conventional FFT algorithms can be significant. The idealized assumptions in [1] were primarily used to 
highlight the conceptual framework underlying the FFAST architecture. 

In this paper, we adapt the noiseless FFAST framework of [1] to a noise robust R-FFAST algorithm. 
The robustness to measurement noise is achieved by judiciously modifying the FFAST framework, as 
explained in details in Section VII-B. Specifically, the FFAST front-end in [1] has multiple stages of 
sub-sampling operations, where each stage further has 2 sub-sampling paths called the delay-chains. 
Both the delay-chains in each stage of the FFAST framework have an identical sampling period but 
a circularly time-shifted input signal, i.e., consecutive shifts. In contrast, in R-FFAST we use a large 
number of random shifts: more precisely, an asymptotically large (i.e. O(log^re) number of delay chains 
per stage, where the input to each delay-chain is shifted by a random amount prior to being subsampled 
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(see Section VII). A random choice of the circular shifts endows the effective measurement matrix with 
good mutual incoherence and Restricted Isometry properties (RIP) [3], thus enabling stable recovery. 

As a motivating example, consider an application of a 2D version of the R-FFAST algorithm to acquire 
the Magnetic Resonance Image (MRI) of the ‘Brain’ as shown in Fig. I. In MRI, recall that the samples 
are acquired in the Fourier domain, and the challenge is to speed up the acquisition time by minimizing 
the number of Fourier samples needed to reconstruct the desired spatial domain image. The R-FFAST 
algorithm reconstructs the ‘Brain’ image acquired on an MR scanner from 60.18% of the Fourier samples 
as shown in Fig. 1(c). In Section IX-B, we elaborate on the details of this experiment. These results are not 
meant to compete with the state-of-the-art techniques in the MRI literature but rather to demonstrate the 
feasibility of our proposed approach as a promising direction for future research. More significantly, this 
experiment demonstrates that while the theory proposed in this paper applies to signals with a uniformly- 
random support model for the dominant sparse DFT coefficients, in practice, our algorithm works even 
for the non-uniform (or clustered) support setting as is typical of MR images. However, the focus of this 
paper is to design a robust sub-linear time algorithm to compute fe-sparse ID DFT of n-length signals 
from its noise corrupted time-domain samples. 

In many practical settings, particularly hardware-based applications, it is desirable or even required 
to have a fixed sampling structure (or measurement system) that has to be used for all input signals 
to be processed. It is this setting that we focus on in this work. Under this setting, we show that the 
proposed R-FFAST algorithm computes an n-length /c-sparse DFT X using 0{k\o^ n) noise-corrupted 
time-domain samples, in 0{k\og^ n) complex operations, i.e., sub-linear sample and time complexity. A 
variant of the R-FFAST algorithm presented in [2] is useful for applications that have the flexibility of 
choosing a different measurement matrix for every input signal (i.e. permit a randomized measurement 
system). Under this setting, which we do not elaborate on in this work, our algorithm can compute an 
n-length fe-sparse DFT using 0{k\og^^^ n) samples and 0{k\o^n) computations. We emphasize the 
following caveats for the results in this paper. First, we assume that the non-zero DFT coefficients of 
the signal x have uniformly random support and take values from a finite constellation (see Section II 
for more details). Secondly, our results are probabilistic and are applicable for asymptotic values of k, n, 
where k is sub-linear in n. Lastly, we assume an i.i.d Gaussian noise model for observation noise. 

The rest of the paper is organized as follows: In Section II, we provide the problem formulation and 
the signal model. Section III provides the main result of this paper. An overview of the related literature 
is provided in Section IV. In Section V, we use a simple example to explain the key ingredients of 
the R-FFAST architecture and the algorithm. Section VI, provides a brief review of some Frequency- 
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estimation results from signal processing literature, that are key to our proposed recovery algorithm. In 
Section VII and Section VIII, we provide a generic description of both the R-FFAST architecture as well 
as the recovery algorithm. Section IX provides experimental results that validate the performance of the 
R-FFAST algorithm. 


II. Signal model and Problem eormulation 


Consider an n-length discrete-time signal x that is a sum of /c << n complex exponentials, i.e., its 
n-length discrete Fourier transform has k non-zero coefficients: 


k-l 

= p = 0,l,...,n-l, (1) 

5=0 

where the discrete frequencies ^q G {0,1,..., n — 1} and the amplitudes X[iq\ G C. We consider the 
problem of computing the ^-sparse n-length DFT X of the signal x, when the observed time-domain 
samples y are corrupted by white Gaussian noise, i.e.. 


y = x + z, 


( 2 ) 


where z G CAA(0, Inxn)- 

Further, we make the following assumptions on the signal: 

• The number of non-zero DFT coefficients k = 0{n^), where 0 < 5 < 1, i.e., sub-linear sparsity. 

• Let Ai = {^j2 + and 0 = {27rz/M2}^Q~^, where p is a desired signal-to-noise 

ratio and Mi, M 2 are some constants. 

• We assume that all the non-zero DFT coefficients belong to the set A = I a G Ad,(^ G 0} 

(see Remark III.2 for a further discussion on this constraint). 

• The support of the non-zero DFT coefficients is uniformly random in the set {0,1,..., n — 1}. 

• The signal-to-noise ratio p, is defined as, p = [f’]P}/IE{||-?|p/n}. 

III. Main results 

The proposed R-FFAST architecture is targeted for applications, where a fixed sampling strucfure or 
front-end is used for all the input signals. It computes an n-length /c-sparse DFT X using 0{klog^ n) 
noise-corrupted time-domain samples, in 0{klog^ n) complex operations, i.e., sub-linear sample and 
time complexity. A variant of the R-FFAST algorithm that allows for flexibility of choosing a different 
measurement matrix for every input signal is presented in [2] and computes an n-length /c-sparse DFT 
X using 0(fclog^/^n) samples and in 0{k\o^n) computations. 
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Theorem III.l. For a finite but sufficiently high SNR, 0 < (5 < 1, and large enough length n, the R- 
FFAST algorithm computes a k-sparse DFT X G of an n-length signal x, where k = 0{n^), from 
its noise-corrupted time-domain samples y, with the following properties: 

• Sample complexity; The algorithm needs m = 0{klog^ n) samples of y. 

• Computational complexity; The R-FFAST algorithm computes DFT X using 0{k log^ n) arithmetic 
operations. 

• Probability of success; The algorithm perfectly recovers the DFT X of the signal x, with probability 
at least 1 — 0{l/k). 

Proof: Please see Appendix C. ■ 

Remark III.2. The R-FFAST algorithm reconstructs the k-sparse DFT X perfectly, even from the noise- 
corrupted time-domain samples, with high probability, since in our signal model we assume that X G 
A^. The reconstruction algorithm is equally applicable to an arbitrary complex-valued non-zero DFT 
coefficients, as long as all the non-zero DFT coefficients respect the signal-to-noise-ratio as defined in 
Section II. The proof technique for arbitrary complex-valued non-zero DFT coefficients becomes much 
more cumbersome, and we do not pursue it in this paper. However, in Section IX we provide an empirical 
evidence of the applicability of the R-FFAST algorithm to signals with arbitrary complex-valued DFT 
coefficients. The reconstruction is deemed successful if the support is recovered perfectly. We also note 
that the normalized ii-error is small (see Section IX for more details) when the support is recovered 
successfully. 


IV. Related work 

The problem of computing a sparse discrete Fourier transform of a signal is related to the rich 
literature of frequency estimation [4, 5, 6, 7] in statistical signal processing as well as compressive- 
sensing [8, 9]. Most works in the frequency estimation literature use well studied subspace decomposition 
principles e.g., singular-value-decomposition, like MUSIC and ESPRIT [4, 6, 7]. These methods quickly 
become computationally infeasible as the problem dimensions {k,n) increase. In contrast, we take a 
different approach combining tools from coding theory, number theory, graph theory and statistical signal 
processing, to divide the original problem into many instances of simpler problems. The divide-and- 
conquer approach of R-FFAST alleviates the scaling issues in a much more graceful manner. 

In compressive sensing, the bulk of the literature concentrates on random linear measurements, followed 
by either convex programming or greedy pursuit reconstruction algorithms [9, 10, 11]. A standard tool 
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used for the analysis of the reconstruction algorithms is the restricted isometry property (RIP) [3]. The 
RIP characterizes matrices which are nearly orthonormal or unitary, when operating on sparse vectors. 
Although random measurement matrices like Gaussian matrices exhibit the RIP with optimal scaling, 
they have limited use in practice, and are not applicable to our problem of computing a sparse DFT from 
time-domain samples. So far, to the best of our knowledge, the tightest characterization of the RIP, of a 
matrix consisting of random subset of rows of an n x n DFT matrix, provides a sub-optimal scaling, i.e., 
O(fclog^n), of samples [12]. In contrast, we show that by relaxing the worst case assumption on the 
input signal one can achieve a scaling of O(felogn) even for partial Fourier measurements. An alternative 
approach, in the context of sampling a continuous time signal with a finite rate of innovation is explored 
in [13, 14, 15, 16]. 

At a higher level though, despite some key differences in our approach to the problem of computing 
a sparse DFT, our problem is indeed closely related to the spectral estimation and compressive sensing 
literature, and our approach is naturally inspired by this, and draws from the rich set of tools offered by 
this literature. 

A number of previous works [17, 18, 19, 20] have proposed a sub-linear time and sample complexity 
algorithms for computing a sparse DFT of a high-dimensional signal. Although sub-linear in theory, 
these algorithms require large number of samples in practice and are robust against limited noise models, 
e.g., bounded noise. In [21], the authors propose an algorithm for computing a A:-sparse 2D-DFT of an 
y/nx y/n signal, where k = Q{y/n). For this special case, the algorithm in [21] computes 2D-D¥T using 
O(fclogn) samples and 0{klog^ n) computations. In contrast, the R-FFAST algorithm is applicable to 
ID signals and for the entire sub-linear regime of sparsity, i.e., k = O{n^),0 < 6 < 1, when the 
time-domain samples are corrupted by Gaussian noise. 

V. R-FFAST ARCHITECTURE AND ALGORITHM EXEMPLIFIED 

In this section, we describe the R-FFAST sub-sampling “front-end” architecture as well as the associated 
“back-end” peeling-decoder using a simple example. Later, in Section VII and Section VIII, we provide 
a generic description of the front-end architecture and the back-end algorithm. 

Consider an n = 20 length input signal x, whose DFT X, is k = 5 sparse. Further, let the 5 non-zero 
DFT coefficients of x be V[l], 2f[3], 2f[5], V[10] and V[15]. Let y = x+z, be a 20-length noise-corrupted 
time-domain signal, input to the R-FFAST architecture shown in Fig. 2. In its most generic form, the 
R-FFAST sub-sampling ‘front-end’ consists of d (> 3) stages, where d is a non-decreasing function of 
the sparsity-index 6 (where k = 0{n^) ). Each stage further has D number of subsampling paths or 
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R-FFAST 

R-FFAST front-end back-end 



stage-1 


Fig. 2. An example 2-stage R-FFAST architecture. The noise-corrupted time-domain samples y = x z, of a 20-point signal 
X with a 5 sparse DFT X, are processed using the R-FFAST front-end and the peeling back-end algorithm. 

delay-chains with an identical sampling period, e.g., all the delay-chains in stage 0 of the Fig. 2 have 
sampling period 5. The input signal to the different delay-chains is circularly shifted hy certain amount. 
For illustrative purposes, in Fig. 2, we show the processing of noise-corrupted samples y through a d = 2 
stage R-FFAST architecture, where the sampling period of the two stages are 5 and 4 respectively. Each 
stage further has D = 3 delay-chains. The input signal to each of the D = 3 delay-chains is circularly 
shifted hy 0, 2 and 9 respectively prior to the suh-sampling operation. The output of the R-FFAST front- 
end is obtained hy computing the short DFTs of the suh-sampled data and further grouping them into 
‘bins’, as shown in Fig. 2. The R-FFAST peeling-decoder then synthesizes the big 20-point DFT X 
from the output of the short DFTs that form the bin observations. The relation between the output of 
the short DFTs, i.e., bin-observations, and the DFT coefficients of the input signal x, can be computed 
using preliminary signal processing identities. Let yb^ij denote a 3-dimensional observation vector of bin 
j of stage i. Then, a graphical representation of this relation is shown using a bi-partite graph in Fig. 3. 
Left nodes of the bi-partite graph represent the 5 non-zero DFT coefficients of the input signal x, while 
the right nodes represent the “bins” with vector observations. We use fi to denote the number of bins 
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Fig. 3. A 2-left regular degree bi-partite graph representing relation between the unknown non-zero DFT coefficients, of the 
20-length example signal x, and the bin observations obtained through the R-FFAST front-end architecture shown in Fig. 2. Left 
nodes of the bi-partite graph represent the 5 non-zero DFT coefficients of the input signal x, while the right nodes represent 
the “bins” ( also sometimes referred in the sequel as “check nodes”) with vector observations. An edge connects a left node 
to a right node iff the corresponding non-zero DFT coefficient contributes to the observation vector of that particular bin. The 
observation at each check node is a 3-dimensional complex-valued vector, e.g., yb,o,o is observation at bin 0 of stage 0. 


in stage-i, for i = 0,1. An edge connects a left node to a right node iff the corresponding non-zero 
DFT coefficient contributes to the observation vector of that particular bin, e.g., after sub-sampling, the 
DFT coefficient Ai[10] contributes to the observation vector of bin 2 of stage 0 and bin 0 of stage 1. 
Thus, the problem of computing a 5-sparse 20-length DFT has been transformed into that of decoding 
the values and the support of the left nodes of the bi-partite graph in Fig. 3, i.e., sparse-graph decoding. 
Next, we classify the observation bins based on its edge degree in the bi-partite graph, which is then 
used to describe the FFAST peeling decoder. 

• zero-ton: A bin that has no contribution from any of the non-zero DFT coefficients of the signal, 
e.g., bin 0 of stage 0 in Fig. 3. 


yb,0,0 


^ ^Tb,o,o[0] ^ 
^ 6 , 0,0 [ 1 ] 
y R'b.o.op] y 


( 3 ) 


where Wb,ij G I^xs) consists of the DFT coefficients of the samples of the noise vector z. 
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• single-ton: A bin that has contribution from exactly one non-zero DFT coefficient of the signal, 
e.g., bin 2 of stage 0. Using the signal processing properties, it is easy to verify that the observation 
vector of bin 2 of stage 0 is given as. 


ybfi,2 — 


\ 


1 

„i27r20/20 

„i27r90/20 


X[10] + 


■^ 6 , 0,2 [ 1 ] 

\ «^fe,0,2[2] j 


( 4 ) 


multi-ton: A bin that has a contribution from more than one non-zero DFT coefficients of the signal, 
e.g., bin 1 of stage 0. The observation vector of bin 1 of stage 0 is, 

/ 1 \ / 1 \ /.„.„.rnl\ 


yb,o,i — 


1 

gi27r2/20 

gi27r9/20 


^[ 1 ] + 


1 

g*27rl0/20 

g*27r45/20 


A[5] + 


\ w'fe,o,i[ 2 ] ) 


( 5 ) 


Algorithm 1 R-FFAST Algorithm 

1 : Input: The noise-corrupted bin observations yb,ij, obtained using the R-FFAST sub-sampling front- 
end, for each bin j in stage i for all i,j. 


2 : Output: An estimate X of the fc-sparse re-point DFT. 

3: R-FFAST Decoding: Set the initial estimate of the re-point DFT X = 0. Let i denote the number of 
iterations performed by the R-FFAST decoder. 

4: Set the energy threshold T = (1 -|- 'y)D for appropriately chosen 7 (see Appendix C). 

5: for each iteration do 
6: for each stage i do 

7: for each bin j do 

8 : if < T then 

9: bin is a zero-ton. 

10 : else 

11 : (singleton, Vp, p) - Singleton-Estimator {yb,i,j)- 

12 : if singleton = ‘true’ then 

13: Peel-off: yb^s,q = yb,s,q — Vpa{p), for all stages s and bins q = p mod fg. 

14: Set, x\p] = Vp. 

15: else 

16: bin is a multi-ton. 

17: end if 

18: end if 

19: end for 

20 : end for 

21 : end for 
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R-FFAST Iterative peeling-decoder: The R-FFAST peeling decoder uses a function called “singleton- 
estimator” (see Section VIII for details) that exploits the statistical properties of the hin observations to 
identify which bins are singletons and multitons with a high probability. In addition, if the bin is a singleton 
bin, it also determines the value and the support of the only non-zero DFT coefficient connected to this 
bin. The R-FFAST peeling decoder then repeats the following steps (see Algorithm 1 for the pseudocode): 

1) Select all the edges in the graph with right degree 1 (edges connected to single-ton bins). 

2) Remove these edges from the graph as well as the associated left and right nodes. 

3) Remove all the other edges that were connected to the left nodes removed in step-2. When a 
neighboring edge of any right node is removed, its contribution is subtracted from that check node. 

Decoding is successful if, at the end, all the edges have been removed from the graph. It is easy to verify 
that the R-FFAST peeling-decoder operated on the example graph of Fig. 3 results in successful decoding, 
with the coefficients being uncovered in the following possible order: V[10],2f[l],X[3],X[5],X[15]. 

The success of the R-FFAST peeling decoder depends on the following two crucial assumptions: 

1) The bipartite graph induced by the R-FFAST sub-sampling front-end is such that a singleton based 
iterative peeling decoder successfully uncovers all the variable nodes. 

2) The function “singleton-estimator” reliably determines the identity of a bin using the bin-observations 
in the presence of observation noise. 

In [I], the authors have addressed the first assumption, of designing the FFAST sub-sampling front- 
end architecture, guided by the Chinese-remainder-theorem (CRT), such that the induced graphs (see 
Fig. 3 for an example) belong to an ensemble of graphs that have a high probability of being decoded 
by a singleton based peeling decoder. For completeness, we provide a brief description of the FFAST 
sub-sampling front-end design in Section VII, and refer interested readers to [I] for a more thorough 
description. The second assumption of designing a low-complexity and noise-robust “singleton-estimator” 
function is addressed in Section VIII. First, we review some results from the signal processing literature 
of Frequency estimation in the next section, that are key to designing the “singleton-estimator” algorithm. 

VI. Frequency estimation: A single complex sinusoid 

The problem of estimating the frequency and amplitude of a single complex sinusoid from noise 
corrupted time-domain samples has received much attention in signal processing literature. The optimal 
maximum likelihood estimator (MLE) is given by the location of the peak of a periodogram. However, 
the computation of periodogram is prohibitive even with an FFT implementation, and so simpler methods 
are desirable. In [22], the author used a high SNR approximation from [23], to propose an estimator 
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that is computationally much simpler than the periodogram and yet attains the Cramer-Rao hound for 
moderately high SNRs. Next, we hriefly review the high SNR approximation from [23], along with the 
fast sinusoidal frequency estimation algorithm from [22]. 

Consider N samples of a single complex sinusoidal signal hurled in white Gaussian noise, obtained 
at a periodic interval of period T, 

= +w{t), t = 0,T,2T,...,{N -l)T, (6) 

where the amplitude A, frequency w, and the phase cf) are deterministic hut unknown constants. The 
sampling period T determines the maximum frequency Wmax C (0)27r/T), that can he estimated using 
these samples. Henceforth, for ease of notation WLOG we assume that T = 1, i.e., Wmax £ (0, 27r). The 
sample noise w{t) is assumed to he a zero-mean white complex Gaussian noise with variance cr^. 



Fig. 4. Consider a complex number 1 +u, where v = vji+ jvj and |i;| << 1. Then, a high SNR approximation of a complex 
number 1 -|- w is given by exp{jvi) [23]. 

1) High SNR approximation of the phase noise [23]: Let the signal-to-noise-ratio SNR p he defined 
as p = Consider the model of (6), 

y{t) = + w{t) 

= (1 + (7) 

where v{t) = /A, is a zero-mean complex normal random variable with variance 1/p. For 

high SNR, i.e., p » 1, we can approximate the complex number (1 -|- v{t)) as (also see Fig. 4), 

(l + i;(i)) = ^(1 + VR{t)Y + tan-Tu,(t)/(u„(t)+l)) 

~ lgitan-Tu/(t)/(uH(t)-|-l)) 

( 8 ) 
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where the phase vi{t) is a zero mean Guassian random variable with variance 1 / 2 / 9 . The high SNR 
approximation model was first proposed in [23]. In [22], the author empirically validated that the high 
SNR approximation holds for SNR’s of order 5 —7dB. Thus, under high SNR assumption the observations 
in ( 6 ) can be written as, 

2 /(f) (9) 

where u{t) is zero-mean white Gaussian with variance l/{2p). 

2) Frequency Estimator [22]: Using the high SNR approximation model of (9), we have, 

+ - Zy{t), t = 0,l,...,N-2 

= ui + u{t + 1) — u{t) 

i.e., A = 1 oj + z, (10) 

where z G is a zero-mean colored Gaussian noise. The author in [22] then computed the following 

MMSE rule to estimate the unknown frequency to, 

N-2 

A = ^ l3{t)Zy{t + l)y^(f), (11) 

i =0 

where the weights /3(f) have closed form expression (see [22]) and are computed using the optimal 
whitening filter coefficients of an MMSE estimator. In [22], the author shows that the frequency estimate 
of (II) is an unbiased estimator and is related to the true frequency lo as, 

( 12 ) 

where {■) 2 t 7 is a modulo 27r. Equation (12) essentially holds for any value of T, i.e., periodic sampling. 

In [22], the author considered a problem where the unknown frequency cj is a real number in an 
appropriate range, e.g., u G (0, 27r). Hence, the result in (12) is expressed in terms of an estimation error 
as a function of the number of measurements N and the signal-to-noise ratio p. In contrast, in this paper 
we are interested in frequencies of the form 2'Klln, for some unknown integer £ G [0, n — 1]. Hence, we 
use the Gaussian Q-function along with the estimate in (12) to get the following probabilistic proposition. 

Proposition VI.l. For a sufficiently high SNR and large n, the frequency estimate u obtained using (11) 
satisfies, 

Pr{\Lo — uj\> vr/ci) < l/v?, 
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VII. R-FFAST FRONT-END ARCHITECTURE 



CRT-guided set of uniform 
sampiing periods 


Fig. 5. A block diagram of the R-FFAST architecture for processing noise-corrupted samples y = x + z. The n-point input 
signal y is uniformly subsampled by a carefully chosen, guided by the Chinese-Remainder-Theorem, set of d patterus. Each 
stage has D delay-chains and a delay-chain in stage i has fi number of samples. The delay shift pattern (ro, ri,..., ro-i) is 
carefully chosen so as to induce bin-measurement matrices with “good” mutual incoherence and RIP properties (as explained 
in Section VII-B). Next, the (short) DFTs, of each of the delay-chain samples are computed using an efficient FFT algorithm 
of choice. The big n-point DFT X is then synthesized from the smaller DFTs using the peeling-like R-FFAST decoder. 


In its most generic form the R-FFAST front-end architecture consists of d > 3 stages and D delay- 
chains per stage as shown in Fig. 5. The delay-chain in stage-j, circularly shifts the input signal hy 
Ti and then suh-samples hy a sampling period of n/ fj, to get fj output samples. Using identical circular 
shifts Ti in the delay-chain of all the stages is not a fundamental requirement of our architecture, hut 
a convenience that is sufficient, and makes the R-FFAST framework implementation friendly. Next, the 
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(short) DFTs of the output samples of eaeh delay-ehain are computed using an efficient FFT algorithm 
of choice. The hig n-point DFT X is then synthesized from the smaller DFTs using the peeling-like 
R-FFAST decoder. 

A. Number of stages and sampling periods 

The number of stages d and the suh-sampling factor’s n//i, for f = 0,..., d—1, are chosen based on the 
sparsity-index 6, where k = O(n^),0 < 6 < 1. Based on the qualitative nature of the design parameters, 
of the R-FFAST front-end architecture, we identify two regimes of sub-linear sparsity; 1) very-sparse 
regime, where 0 < <5 < 1/3, and 2) less-sparse regime, where 1/3 < <) < 1. The role of the number 
of stages and the subsampling factors used in each stage of R-FFAST front-end is to guarantee that the 
frequency domain aliasing sparse bi-partite graph (see Fig. 3 for an example) is successfully decoded by 
a singleton based peeling-decoder described in Section V, with a high probability. The choice of number 
of stages and the subsampling factors in R-FFAST front-end architecture is identical to that of the FFAST 
front-end architecture [1]. Here, we provide a brief description about the choice of these parameters and 
refer interested reader’s to [1] for more details. 

• Very-sparse regime: For the very-sparse regime we always use d = 3 stages in the R-FFAST 
front-end. The sampling periods n/fi, for stages i = 0,1,2, are chosen such that the integers 
fi = 0{k) -\- 0(1), i.e., are approximately equal to k, and are relatively co-prime factors of n. For 
example, n = 100*101*103*99, k = 100, /o = 100,/i = 101 and /2 = 103, achieves an operating 
point of <5 = 1/4. 

• Less-sparse regime: For the less-sparse regime the number of stages d is a monotonic non-decreasing 

function of the sparsity index 1/3 < <5 < 1. In particular, an operating point of fractional sparsity 
index 6 can be achieved as follows. Let the number of stages d = 1/(1 — 5). Consider a set 
of d relative co-prime integers Vi = such that n = ]/[^J J^Vi = 0{kf/^). Then, the 

subsampling periods n/fi are chosen such that f = YYjtn~‘^ 'P{j)d’ where (y)^ = j mod d. For 
example, n = 10 * 11 * 13, /c = 100 and /o = 10 * 11, /i = 11 * 13, /2 = 13 * 10, achieves 6 = 2/3. 

B. Delay chains and the bin-measurement matrices 

In the FFAST front-end architecture proposed in [1], for the noiseless observations, we have shown 
that D = 2 delays are sufficient to detect the identify of a bin from its observations. On the contrary, in 
the presence of additive noise, as will be shown in the sequel, each stage of the R-FFAST architecture 
needs many more delay-paths to average out the effect of noise. In order to understand the role of the 
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number and the structure of the delay-chains, used in the R-FFAST front-end architecture, we take a 
detailed look at the bin-observations of the example of Section V. 

Consider the observation vector shown in (5), of the bin 1 of stage 0, 


2 / 6 , 0,1 


/ 1 

gi27r2/20 


\ / 

^[ 1 ] + 


V 


„i27r9/20 


1 

gi27rl0/20 

i27r45/20 


\ / 

X[5] + 


V 


'«^ 6 , 0 ,l[ 0 ] 
106 , 0,1 [ 2 ] 


\ 

/ 


^ a(l) d(5) o(9) a(13) o(17) ^ 

^ 3f[l] ^ 
X[5] 

X[9] 

-h 

^ Wl6,0,l[0] ^ 
Wl6,0,l[l] 


X[13] 

iv ^[17] y 


\ W'6,0,i[2] / 


^ 0 , 1 -? + 106 , 0 , 1 ) 


where the Ao,i 
given by 


is referred to as a “bin-measurement” matrix of bin 1 of stage 


0 , and its column is 


/ 


1 \ 


a{i) = < 


V 


g*27r2r/20 

g*27r9r/20 


/ 


if £ = 1 mod 4 


0 


otherwise, 


for £ = 0,1,, 19. In the sequel, we refer to a{i), £ = 0,1..., 19, as a steering-vector corresponding 
to the frequency 27r^/20, sampled at times 0,2 and 9. 

More generally, the observation vector yb,i,j, of bin j of stage / of a R-FFAST front-end architecture 
with d stages and D delay-chains per stage, is given by. 


= AijX -h Wb,i,j, 0<i <d, 0<j<fi, (13) 

where fi is the number of samples per delay-chain in stage i, Wb,i,j ~ CM{0, Idxd) and Aij is the 
bin-measurement matrix. The bin-measurement matrix A* j is a function of the sampling period, number 
of delay-chains as well as the circular shifts used in stage i of the R-FFAST sub-sampling 


January 5, 2015 


DRAFT 
















16 


front-end and its column is given by 


/ r27r^rn 227r^ri 

1 e " , e " , •. 

i2Tr£.r rv 1 \ "^ 

., e - j if ^ = j mod fi 


I 


(14) 

0 

v 

otherwise, 



Note that, the column of the bin-measurement matrix is either zero or a sinusoidal frequency 2tt £/n 
sampled at times 

In (13), both the measurement matrix and the DFT vector are sparse. The singleton-estimator function 
attempts to determine if the bin observation corresponds to a 0-sparse (zeroton) or 1-sparse 

(singleton) or L-sparse {L > 2 multi-ton) signal. In the case when the bin observation corresponds 
to a singleton bin, the function singleton estimator also tries to recover the l-sparse signal. Thus, the 
bin estimation problem is somewhat similar to compressed sensing problem. In the compressed sensing 
literature the mutual-incoherence and the Restricted-isometry-property (RIP) [3], of the measurement 
matrix, are widely studied and well understood to play an important role in stable recovery of a sparse 
vector from linear measurements in the presence of observation noise. Although, these techniques are not 
necessary for our bin-estimation problem, as we will see in the following, they are certainly sufficient. 
Next, we define fhe mufual-incoherence and fhe RIP of fhe bin-measuremenf mafrices. Lafer, we use fhese 
properties to prove a stable recovery of the proposed R-FFAST peeling-style iterative recovery algorithm. 


Definition VII.l. The mutual incoherence /rniax(^) of a measurement matrix A is defined as 

f A\ A \aipVa{q)\ 

/^max(-^) — max 11 _ 11 11 . 11 , 

||a(p)|| • ||o(g)|| 


(15) 


where a{p) is the p^^ column of the matrix A. 


The mutual-incoherence property of the measurement matrix indicates the level of correlation between 
the distinct columns of the measurement matrix. Smaller value of praax{A) implies more stable recovery, 
e.g., p,raa.x{A) = 0 for an orthogonal measurement matrix A. 

Definition VII.2. The restricted-isometry constant 7 s (A) of a measurement matrix A, with unit norm 
columns, is defined to be the smallest positive number that satisfies 

(1 - 7 s ( A ))|| X ||2 < ||^^||2 < (1 + ^^( a ))|| X || 2 , ( 16 ) 

for all X such that ||A||o < s. 


The RIP characterizes the norm-preserving capability of the measurement matrix when operating on 
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sparse vectors. If the measurement matrix A has a good RIP constant (small value of 72 s (A)) for all 
2s-sparse vectors, then a stable recovery can he performed for any s-sparse input vector [24]. Since, a 
small value of 72 s(A) implies that ||A(Xi — A 2 )|p is hounded away from zero for any two distinct, 
Xi ^ X 2 , s-sparse vectors. 

1) Randomness and reliability: As pointed out in earlier discussion, a smaller value of mutual in¬ 
coherence corresponds to more reliahility or stable recovery. If the circular shifts used in the 

D delay chains of the R-FFAST front-end architecture are chosen uniformly at random between 0 and 
n — 1 , then. 


Lemma VII.3. The mutual incoherence /rjnax(Ajj) of a bin-measurement matrix Aij is upper bounded 
by 

fimax(Aij) < 2v^log(5n)/L> , V 7 ,j (17) 


with probability at least 0.2. 

Proof: Please see Appendix A. ■ 

Also, it is easy, i.e., 0{nD) complexity, to verify if a given choice of ro,..., vd-i satisfies (17) or 
not. Hence, using offline processing one can choose a pattern (ro,..., rr)_i), such that deterministically 
the bound in (17) is satisfied. In the following Lemma, we use the Gershgorin circle theorem [25] to 
obtain a bound on the RIP condition of a matrix as a function of its mutual incoherence. 

Lemma VII.4. A matrix A, with unit norm columns and mutual incoherence /Umax(A), satisfies the 
following RIP condition for all X that have ||A||o < s, 

(l-^max(A)(s-l))+||X||2 < ||AX||2 < (l+^ma.(A)(s-l))||X||2 

Proof: Please see Appendix B. ■ 

2) Sinusoidal structured bin-measurement matrix for speedy recovery: In Section VII-B, we have 
shown that the columns of a bin-measurement matrix A, j G £^Dxn determined by circular shifts 
used in each of the delay-chains in the R-FFAST front-end architecture. In particular, as shown in (14), 
the column of a bin-measurement matrix is either zero or a sinusoidal signal with frequency 27rl /n 
sampled at random times corresponding to the circular shifts used in the R-FFAST front-end. 

Although a uniformly random choice of circular shifts endows the bin-measurement matrix with good 
coherence properties, as shown in Lemma VII.3, the lack of structure makes the decoding process, i.e.. 
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bin-processing algorithm, painfully slow. In contrast, if the columns of the bin-measurement matrix consist 
of equispaced samples of sinusoidal frequency 21^1 jn, then as seen in Section VI, the sinusoidal signal 
can be detected using only linear operations, but the mutual incoherence properties are not sufficient 
enough for reliable decoding. 

In this section, we propose a choice of the circular shifts, used in the R-FFAST front-end architecture, 
that is a combination of the randomness and structured sampling that enables us to achieve the best 
of both worlds. We split the total number of delay-chains D into C clusters, with each cluster further 
consisting of N delay-chains, i.e., D = NC. The first delay-chain of each cluster circular shifts the input 
signal by rj, z = 0,..., C — 1, a uniformly random number between 0 and n — 1. The delay-chain of 
the cluster circularly shifts the input signal by -|- (j — 1)2*, i.e., equisapced samples with spacing 
of 2* starting from r*. The colunm of the resultant bin-measurement matrix is given by, 

^ ^z2nt{ro+0)/n 

^i2TT£{ro+l)ln 


a{£) 


^i2Tr£(ro+iN-l))/n 
^i2n£{ri+0)/n 
^i2w£{ri+2)/n 


(18) 


^i2ni{ri+2{N—1)) / n 


\ 


^i2Tr£{rc-i+2'^ ^{N-l))/n 




In next section, we describe a bin-processing algorithm that exploits the equispaced structure, of the 
intra-cluster samples, to estimate the support of a singleton bin in linear time of the number of samples, 
i.e., 0{D). 


VIII. R-FFAST BACK-END PEELING ALGORITHM 

The R-FFAST back-end peeling decoder recovers the big n-point DFT X from the smaller DFTs output 
by the R-FFAST front-end, i.e., bin-observations, using an onion peeling-like algorithm as described 
in Algorithm 1. One of the crucial sub-routine used by the R-FFAST back-end peeling algorithm 
is “singleton-estimator”. The function singleton estimator exploits the statistical properties of the bin 
observations to identify which bins singletons and multitons with a high probability. In addition, if 
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the bin is a singleton bin, it also determines the value and the support of the only non-zero DFT 
coefficient connected to this bin. In Section V, we explained the operation of the R-FFAST back-end 
peeling algorithm using a simple example. In this section, we focus on the description of the “singleton- 
estimation” function. 

The singleton-estimator function estimates a single column of the bin-measurement matrix (or equiv¬ 
alently the frequency a; = 27rf'/n) that best describes (in squared error sense) the bin observations. It 
does so using a two step procedure. First, the frequency estimation algorithm in [22] (i.e., estimator in 
(11)) is used to process the observations corresponding to each cluster, to obtain C estimates coi for 
i = 0,... ,C — 1. Then, these estimates are fused together to obtain the final frequency esfimate to 
that corresponds to the support of the potential singleton bin. If the estimated column does not justify 
the actual bin observations to a satisfactory level (based on an appropriately chosen threshold) then the 
processed bin is declared to be a multi-ton bin. 

1) Processing observations in a cluster: Consider processing the observations of cluster i. 

Ai{t) = Zy{t + 1) - Zy{t), t = Ni, Ni + 1,..., N{i + 1) - 2 
= 2^ U! + u{t + 1) — u{t) 

Ai = 12^u + z, (19) 

where we have used the high SNR approximation model to get z G as a zero-mean colored 

Gaussian noise. Further, applying the MMSE rule of (11) we get an estimate uii of 2^io as follows. 

Thus, each cluster of observations provides an estimate (up to modulo 27r) of a multiple of the true 
frequency to. Next, we show how these different estimates can be used to successively refine fhe search 
space of the true frequency to. 

2 ) Successive refinement: Let = (w* — vr /ci, w* -|- tt/ci ) be a range of frequencies around the 

estimate w* that contains the true frequency 2*a; with high a probability (as shown in proposition VI. 1), 
obtained by processing observations of cluster-f. For a continuous set, with a slight abuse of notation we 
use I • I to denote the length of the interval, e.g., |flo| = 27r/ci. The operation of multiplying a frequency 
by 2, maps two frequencies 6 and O+tt to the same frequency, i.e., (20)2** = (2(0-|-7r))27r. However, since 
|llo| = for a sufficiently large constant ci, |llonfli/2| < 27r/(2ci). Similarly, the intersection of 
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Algorithm 2 singleton-estimator 
1 : Inputs: 


• A noise-corrupted bin observations y?, G consisting of C clusters each with N number of 

measurements. 

• MMSE filter coefficients {/3(0),..., j3{N — 2)}. 


2 : Outputs: 1) A boolean flag ‘singleton’, 2) If singleton, then estimate of the value Vp of the non-zero 
DFT coefficient and the position (or support) p of the non-zero DFT coefficient. 

3) Set — d" 

4: Set the singleton = ‘false’. 

5: Set D = NC. 

6 : for each cluster f = 0 ,..., C — 1 do 

7: UJnew = - Ni)Zyb{t + l)yl{t). 

8: = r(27r/2*) -h UJnew - UJprev 

9: S 2 = (27r/2*) -I- UJnew “ UJprev 

10: if |(5i| < |(i 2 | then 

11 : ^prev — ^1 “ 1 “ ^prev 

12 : else 

13: ^prev — ^2 + ^prev ■ 

14: end if 

15: end for 

16: Set the support estimate q = round{ujprevn/2 tt). 

17: Set the energy threshold T = {1 + "i)D for an appropriately chosen 7 (see Appendix C). 

18: Vq = d{q)^yb/D. 

19: if ||y — Uga((?)|p < T then 
20 : singleton = ‘true’. 

21 : p = q and Vp = Vq. 

22 : end if 


estimates from C consecutive clusters is. 


a/2*I < 


27r 


2^-1 Cl' 


( 21 ) 


Further, for a singleton bin, we know that the frequency corresponding to the location of the non-zero 
DFT coefficient is of the form 27r£/n for some integer £ G {0,..., n — 1}. Hence, if the number of clusters 
C = O(logn) and the number of samples per cluster^ N = O(log^'^^n), then the singleton-estimator 
algorithm correctly identifies fhe unknown frequency wifh a high probability (see proposition VIII. 1). 


^In construction of the measurement matrix (18), as well as in algorithm 2, we assumed that the signal length n is not an 
even number. If n is an even number, then one can use some other prime, not a factor of n, in (18) and in algorithm 2. For 
example, if n is even but does not have 3 as a factor, then the samples in cluster are equispaced with difference being 3*, 
instead of 2* as suggested in (18). 
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The pseudo code of the algorithm is provided in Algorithm 2. 

Proposition VIII.l. For a sufficiently high SNR and large n, the singleton-estimator algorithm with 
C = O(logn) clusters and N = samples per cluster, i.e., D = 0(log^/^ n) samples, correctly 

identifies the support of the non-zero DFT coefficient contributing to the singleton bin with probability 
at least 1 — 0{l/n?). The algorithm uses no more than 0(log‘^/^n) complex operations. 

Proof: Please see Appendix F ■ 


IX. Simulation Results 

In this section, we first evaluate the performance of the R-FFAST algorithm on synthetic data and 
contrast the results with the theoretical claims of Theorem III. 1 . The synthetic data used for evaluation is 
heyond the theoretical claims of Theorem III.l. In particular, the DFT coefficients of the used synthetic 
signal have arbitrary phase, instead of a finite constellation as assumed in Section II. For arbitrary 
complex-valued DFT coefficients one cannot expect to perfectly reconstruct the DFT X from noise- 
corrupted time-domain samples. A reconstruction is deemed successful, if the support of the non-zero 
DFT coefficients is recovered perfectly. Additionally, we observe that the normalized f'l-error is small 
compared to 1 when support recovery is successful. 

In addition to evaluating the performance of the R-FFAST algorithm on synthetic signals, in Sec¬ 
tion IX-B, we also provide details of the application of a 2D version of the R-FFAST algorithm to 
acquire the Magnetic Resonance Image (MRI). Application of the R-FFAST algorithm to acquire MRI is 
not meant to compete with the state-of-the-art techniques in the MRI literature, but rather to demonstrate 
the feasibility of our proposed approach as a promising direction for future research. More significantly, 
this experiment demonstrates that while the theory proposed in this paper applies to signals with a 
uniformly-random support model for the dominant sparse DFT coefficients, in practice, our algorithm 
works even for the non-uniform (or clustered) support setting as is typical of MR images. 

A. R-FFAST performance evaluation for synthetic signals 

In [I], we have theoretically analyzed as well as empirically validated the scaling of the number 
of samples m required by the FFAST algorithm, as a function of the sparsity k. In this section, we 
empirically evaluate the scaling of the number of samples m and the processing time, required by the 
R-FFAST algorithm to successfully compute the DFT X, as a function of the ambient signal dimension 
n, see Fig. 6. 
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Samples m Vs Signal Length n 
Fixed: k = 40, SNR = 5dB and P(success) > 0.99 



Signal length: n xio" 


R-FFAST total time complexity Vs Signal Length n 



(a) Sample complexity of the R-FFAST algorithm (b) Time complexity of the R-FFAST algorithm as 
as a function of signal length n. a function of signal length n. 


Fig. 6. The plots show the scaling of the number of samples m and the total time required by the R-FFAST algorithm to 
successfully reconstruct a fc = 40 sparse DFT X, for increasing signal length n. For a fixed support recovery probability of 
0.99, sparsity k and SNR of ^dB, we note that both the sample complexity as well as time complexity of the R-FFAST scale 
sub-linearly in n. 


Simulation Setup.- 

• An n-length DFT vector X with A: = 40 non-zero coefficients is generated. The non-zero DFT 
coefficients are chosen to have a uniformly random support, fixed amplitude hut an arbitrary random 
phase. The input signal x is obtained by taking the IDFT of X. The length of the signal n is varied 
from n = 49 * 50 * 51 « 0.1 million to n = 49 * 50 * 51 * 12 « 1.49 million, i.e., 12x. 

• The noise-corrupted signal y = x + z, where z is such that the effective signal-to-noise ratio is 5dB, 
is processed through a d = 3 stage R-FFAST architecture with D = NC delay-chains per stage. 
The sampling periods of the 3 stages in the R-FFAST architecture are 50 * 51, 51 * 49 and 49 * 50 
respectively. This results in the total number of bins in the R-FFAST front-end architecture to be 
Uh = 150. 

• As the signal length n varies, the number of clusters C and the number of samples per cluster N 
are varied, to achieve at least 0.99 probability of successful support recovery. 

• Each sample point in the plot is generated by averaging over 1000 runs. 

1) R-FFAST sample complexity: In proposition VIII. 1, we have shown that C = O(logn) clusters and 
N = 0(log^^^ n) samples per cluster are sufficient for decoding a singleton bin correctly using the bin¬ 
processing Algorithm 2. In the empirical evaluation of the R-FFAST, we observed that = 3 log^/^(n) 

and C = [9 : 12] Rs log(n), were sufficient to successfully recover the support of the DFT X with 
probability more than 0.99. Thus, the overall empirical sample complexity m k, 0{k log^^^ n), as shown in 
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Fig. 6(a). Note that, the empirical sample complexity performance coincides with the sample requirement 
for reliable decoding of a singleton-hin as pointed out hy proposition VIII. 1 and is better than the 
theoretical claims of Theorem III.l. We believe that this discrepancy is mainly due to the weaker analysis 
of a multi-ton bin error event in Appendix C. 

2) R-FFAST time complexity: In Fig. 6(b), we plot the total time, sum of the time required for the 
front-end processing as well as the back-end peeling algorithm, required by the R-FFAST to successfully 
recover the DFT X, as a function of increasing signal length re. Note that, a 12x fold increase in the 
ambient signal dimension re resulted in mere 30% increase in the time complexity of the R-FFAST 
algorithm. Thus, empirically substantiating the claim of sub-linear in signal length re time complexity of 
the R-FFAST algorithm. 

B. Application of the R-FFAST for MR imaging 

The ID R-FFAST architecture proposed in this paper can be generalized, in a straightforward manner, 
to 2D signals, with similar performance guarantees. In this section, we apply the 2D R-FFAST algorithm 
to reconstruct a brain image acquired on an MR scanner^ with dimension 504 x 504. In MR imaging, 
the samples are acquired in the Fourier domain and the task is to reconstruct the spatial image domain 
signal from significantly less number of Fourier samples. To reconstruct the full brain image using 2D 
R-FFAST, we perform the following two-step procedure: 

• Differential space signal acquisition-. We perform a vertical finite difference operation on the image 

by multiplying the 2D-DVI signal with This operation effectively creates an approximately 

sparse differential image in spatial domain, as shown in Fig. 7(e), and can be reconstructed using 
R-FFAST. Note, that the finite difference operation can be performed on the sub-sampled data, 
and at no point do we need to access all the input Fourier samples. The differential brain image 
is then sub-sampled and reconstructed using a 3-stage 2D R-FFAST architecture. Also, since the 
brain image is approximately sparse, we take 15 delay-chains in each of the 3 stages of the 2D R- 
FFAST architecture. The R-FFAST algorithm reconstructs the differential brain image using 56.71% 
of Fourier samples. 

• Inversion using fully sampled center frequencies: After reconstructing the differential brain image, 
as shown in Fig. 7(f), we invert the finite difference operation by dividing the 2Z7-DFT samples 

^The original MR scanner image is not pre-processed, other than retaining an image of size 504 x 504. 
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(a) Log intensity plot of the 2_D-DFTof (b) Original ‘Brain’ image in spatial (c) Reconstructed ‘Brain’ image using 
the original ‘Brain’ image. The red domain. the 2D R-FFAST architecture along 

enclosed region is fully sampled and with the fully sampled center frequen- 

used for the stable inversion. cies. The total number of Fourier sam¬ 

ples used is 60.18%. 



(d) Log intensity plot of 2D-DFT of (e) Differential ‘Brain’ image obtained (f) Differential ‘Brain’ image recon- 
the original ‘Brain’ image, after appli- using the vertical difference operation structed using the 2D R-FFAST archi- 
cation of the vertical difference opera- on the original ‘Brain’ image. lecture with 56.71% of Fourier sam- 

tion. pies. 

Fig. 7. Application of the 2D R-FFAST algorithm to reconstruct the ‘Brain’ image acquired on an MR scanner with dimension 
504 X 504. We first reconstruct the differential ‘Brain’ image shown in Fig. 7(e), using d — 3 stage 2D R-FFAST architecture 
with 15 random circular shifts in each of the 3 stages of the R-FFAST architecture. Additionally we acquire all the Fourier 
samples from the center frequency as shown, by the red enclosure, in Fig. 7(a). Then, we do a stable inversion using the 
reconstructed differential ‘Brain’ image of Fig. 7(f) and the fully sampled center frequencies of Fig. 7(a), to get a reconstructed 
full ‘Brain’ image as shown in Fig. 7(c). Our proposed two-step acquisition and reconstruction procedure takes overall 60.18% 
of Fourier samples. 


with 1 — Since the inversion is not stable near the center of the Fourier domain, only the 

non-center frequencies are inverted and the center region is replaced hy the fully sampled data. 

• Overall, the 2D R-FFAST algorithm uses a total of 60.18% of Fourier samples to reconstruct the 
hrain image shown in Fig. 7(c). 
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Appendix A 

Mutual incoherence bound 

Let a{p) be a column of the bin-measurement matrix. Then, using the structure of a{p) from (14), 
we have, 

< max^|a(p)ta(g)| 

D-l 


= max-—I exp(z27rf'rs/n)| 


£7^0 D 


s=0 


= meixa(i) , 
£7^0 


( 22 ) 


where p{£) = \ exp(z27r('rs/n)|/L). 

Now consider the summation cos{i2Trirs/n)/D for any fixed i ^ 0. According to the hypothesis 

of the Lemma VIL3, circular shifts {rs}^^ are uniformly at random between 0 and n — 1. Hence, 
the terms in the summation, i.e., {cos{i2TTirs/n)/D}f~Q, are i.i.d random variables with zero-mean 
and hounded support, given hy [—1/D, l/D], Using Hoeffding’s inequality for the sum of independent 
random variables with bounded support we have, for any f > 0 , 

D-l 

Pr{\ cos{i2ntrs/n)/D\ > t) < 2exp(—f^i2/2). 


Similarly, 


s=0 


D-l 


Pr{\ sm.{i2'K£rs/n)/D\ > t) < 2exp(—f^i2/2). 


s=0 


Applying a union bound over the real and the imaginary parts of the summation term in p{l), we get. 


Pr{p{i) > V2t) < 4exp(—f^D/ 2 ). 


(23) 


Further applying a union hound over all £ 7 ^ 0, we have. 


PripmsiA^ij) > < 4nexp(-f^D/2) 

= 0.8 


for t = ^y2log{5n)/D. Thus, over all the random choices of the circular shifts rg,...,r£)_i, with 
probability at least 0 . 2 , 

f^max(-'4-ij) < 2^1og(5?T.)/Z) , V i,j- 
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Appendix B 

Restricted-isometry-property 

Consider a s-sparse vector X and a matrix A. Using basic linear algebra we get the following inequality, 

A„,in(AtA)||A||2 < ||A;?||2 < A„,ax(AtA)||A||2 

The Gershgorin circle theorem [25], provides a bound on the eigen-values of a square matrix. It states 
that, every eigen-value of a square matrix lies within at least one of the Gershgoring discs. A Gershgorin 
disc is defined for each row of the square matrix, with diagonal entry as a center and the sum of the 
absolute values of the off-diagonal entries as radius. Note that by hypothesis of the Lemma VII.4, the 
diagonal entries of the matrix A^A are equal to 1. Hence, /rmax(A) provides an upper bound on the 
absolute values of the off-diagonal entries of A^^A. 

Then, using the Gershgorin circle theorem we have, 

(l-/in,ax(A)(s-l))+||A||2 < ||AA||2 < (1 + ^„,ax( A) (s - 1)) | |X| | ^. 


Appendix C 

Proof of Theorem III. 1 

In this section we provide a proof of the main result of the paper. The proof consists of two parts. In the 
first part of the proof, we show that the R-FFAST algorithm reconstructs the DFT X, with probability 
at least 1 — 0{l/k), using m = 0{k\o^ n) samples. In the second part of the proof, we show that 
computational complexity of the R-FFAST decoder is 0{k\og^ n). 

A. Reliability Analysis and sample complexity of the R-FFAST 

1) Reliability analysis: Let Ej denote an event that the R-FFAST algorithm fails to reconstruct the 
DFT X. Let Ef, be an event that the Algorithm 2, after processing the observations of a bin, fails to 
correctly classify the bin as a zero-ton, a singleton or a multi-ton bin. In the case of a singleton bin 
successful decoding also involves identifying the support and the value of the contributing non-zero DFT 
coefficient correctly. Now, if we use E to denote the event that during the entire decoding process there 
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exists some bin for which Algorithm 2 make a wrong decision, then putting the pieces together, we get, 

Pr{Ef) < Pr{E) + Pr{Ef \ E) 

< Pr(E) + 0(l/k) (24) 

where in (a), we used the fact that when there are no bin-processing errors in the entire decoding 
procedure, i.e., event E occurs, the R-FFAST algorithm performs identical to the noiseless FFAST 
algorithm proposed in [1]. In Section VII-A, we have shown that the R-FFAST front-end architecture, 
by construction, outputs 0(k) bins each with a vector observation. In [1], we show that the FFAST 
peeling-decoder processes each bin only constant number of times, i.e., peeling-iterations are constant. 
Thus, we bound the probability of the event E using a union bound over the constant number of iterations 
required for the R-FFAST algorithm to reconstruct the DFT X and over 0(k) bins used in the R-FFAST 
architecture to get. 


Pr{E) < number of iterations x number of bins x Pr{Ei,) 
= 0{k)Pr{Eb) 


(25) 


From (24) and (25), we note that in order to complete the reliability analysis of the R-FFAST algorithm, 
we need to show that Pr{Eh) < 0{l/k'^). 

The following lemma plays a crucial role in the analysis of the event Ej,. It analyzes the performance 
of an energy-based threshold rule to detect the presence of an arbitrary valued complex vector u, from 
its noise-corrupted samples. 


Lemma C.l. For an arbitrary valued complex vector u G and CAf{0, Idxd)> have, 

D{\\u\\yD-^r 


Pr{\\u Pz\\ <{l + y)D) < exp - 


2 + 4||m||2/D 


(26) 


for any constant 0 < 7 < H'ulp/Z). 
Proof: Please see Appendix D 


Without loss of generality, consider algorithm 2 processing the observations of bin b. In Section V, 
using signal processing identities, we have shown that the bin observation noise uJf, ~ CM{0, Idxd)- 
The processed bin b can either be a zero-ton bin, or a single-ton bin or a multi-ton bin. Next, we analyze 
the probability of success of the bin-processing algorithm 2 for all the three possibilities. 
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a) Analysis of a zero-ton bin: Consider a zero-ton bin b, with an observation yf, = Wb- Let be 
an event that a zero-ton bin is not identified as a ‘zero-ton’. Then, 

Pr{E,) = Pr{\\wb\\^>{l+y)D) 

= -f’(X2D > 2(1-b 7)L>) 

< 2exp{-Dy^/9) V 7 G [0,1/3], (27) 


where the last inequality follows from a standard concentration bound for Lipschitz functions of Gaussian 
variables, along with the fact that the Euclidean norm is a 1-Lipschitz function. Thus, 

Pr{E,) < 0{l/k‘^), if D = 0{logk) and 7 G [0,1/3]. (28) 


b) Analysis of a single-ton bin: Let fb = X[P\a{P} -)- Wb, be an observation vector of a single-ton 
bin with contribution from non-zero DFT coefficient at support i and value Xf], The steering vector 
a{P} (see (14) for detailed structure) is the column of the bin-measurement matrix. Let Eg be an 
event that a single-ton bin is not decoded correctly. The event Eg consists of the following three events. 

Single-ton bin is wrongly classified as a zero-ton bin: {Eg^] Let Eg^ denote an event that the 
single-ton bin fails the energy test of the Algorithm 1 and is classified as a zero-ton. 


Pr{Eg,) = 

< 

where the last inequality follows from using Lemma C.l, X[£\ = and the norm of steering vector 

\\a{i)\\ = VD. 

Either the support or the value of the single-ton bin is detected wrongly: Let Egg denote an 

event that the Algorithm 2 wrongly detects either the support or the value of the singleton bin, i.e.. 


Pr{\\yb\f <{l + y)D) 


Pr{\\X[l]d{t)+Wbf < {f+l)D) 


exp 


2 + 4p 


V 7 < p 


(29) 


Pr{Egg) < 2Pr{i i) + Pr{x]£] / X[£] \ £ = £) 

< 0{l/n^) + Pr{x\£] / X[£] \ £ = £) (30) 

where last inequality follows from proposition VIII. 1 for D = 0(log^/^ n) carefully chosen samples per 
bin. 

From Algorithm 2, we have X[£] = d{£)^yb/D = X[£] -\- D). Then, using the fact that 
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non-zero DFT coefficients take value from a M-PSK constellation with magnitude we have, 

Pr{x]e] ^X[e]\i = i) < Pr{\CM{0, l/D)\> ^sin(7r/M)) 

= exp(—D/ 9 sin^( 7 r/M)). (31) 

Hence, from (30) and (31) and using the fact that p,M are constants, we have, 

Pr{Ess) < 0{l/k^) (32) 

for D = 0{\og^/^ n), carefully chosen samples as per discussion in Section VII-B. 

Single-ton bin is wrongly classified as a multi-ton bin: \_Esm\ Let Egm be an event that a singleton 
hin b is processed hy the Algorithm 2 and classified as a multi-ton hin. Thus, 

Pr{Esm) < Pr{Esm I Ess) + Pr{Ess) 

= Pr{E^) + Pr{Ess) (33) 

Further using a union hound, we get an upper hound on the prohahility of event Eg as, 

Pr{Es) < Pr{Esz) + Pr{Ess) + Pr{Esm) 

Thus, from (28), (29), (32) and (33), we have, 

Pr{Es) < 0{l/k‘^), if D = Oiiog^^^ n) and 0 < 7 < min{l/3,p}. (34) 

c) Analysis of a multi-ton bin: Consider a multi-ton hin b, that has a contrihution from L > 2 
components. Then, the observation vector of this hin can he written as = Ai,v W},, where v £ 
is an L-sparse vector with L non-zero DFT coefficients, Afy is the hin measurement matrix and Wb ~ 
CM{0, Idxd)- Let Em he an event that a multi-ton hin is decoded as a single-ton hin with a support i 
and the non-zero DFT coefficient Xf] for some i, i.e., 

Pr{Em) <Pr{\\AbV-X[i]a{l) + Wb\? < {l + l)D) (35) 

First we compute a lower hound on the term \\AbV — 2f[£]a(f’)|p using the RIP Lemma VII.4. Let 
u = V — X[f\ei, where ci is a standard basis vector with 1 at location and 0 elsewhere. The vector u 
is either L — l,LorL-|-l sparse. Then, using the Lemmas VIL3, VIL4 and the fact that all the non-zero 
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components of u are constant, we have, 


\\A,v-X[^]a{l)f = W\\^ 

> czLpD{l - 2Lv'log(5n)/Z))+ (36) 

for some constant C3 > 0. Next, we compute an upper bound on the probability of the failure event Em, 

Pr{Em) < Pr{Em \ L < \og{k)) + Pr{L > log(A:)) 

< Pr{Em I L < log(fe)) + 0(l/fe2), (37) 


where in last inequality we have used the fact that the number of the non-zero DFT coefficients connected 
to any bin is a Binomial k) random variable (see [1] for more details), for some constant r/ > 0. 

Hence to show that Pr{Em) < 0(l/fc^), we need to show Pr{Em \ L < log(A;)) < 0{l/k‘^). Towards 
that end, 

Pr{Em \ L < log{k)) = PriWAbV - X[£]a{£) < {1+-/)D \ L <log{k)) 

2L^log(5n)/T)^^ - 7^ 

. — 2L^/log{5n)/D 

where in (a) we used the Lemma C.l and the lower bound from (36). 

Hence, 

Pr{Em) <0{l/k‘^), if D = 0{log^klogn), (38) 

and 0 < 7 < c^Lp ^1 — 2Ly/log{5n)/ dJ , for 2 < L < log k. 

d) Upper bound on the probability of event E;,: From (28), (34) and (38) we have, 

Pr{Eb) < 0{l/e), (39) 

for some constant 7 > 0, fixed p > 0 and D = 0{log^{k) log(n)). 



(a) 

< 


max exp 

2<L<log k 


-D (csLp(l - 




2 + AcsLp 


B. Sample and Computational complexity of R-FFAST 

In Section VII-A, we have shown that the R-FFAST front-end architecture, by construction, has a 
constant number of stages d and outputs 0{k) bins. Additionally, in Section C-Al, we have shown that 
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D = O(log^n), samples per bin are sufficient for reliable recovery of the DFT X. Hence, the total 
sample complexity of the R-FFAST algorithm in the presence of the observation noise is 

m = 0{k log^ n). 

Moreover, in [1], we show that the FFAST peeling-decoder processes each bin only constant number of 
times, i.e., peeling-iterations are constant. Thus, the total computational cost of the R-FFAST algorithm 
is. 


Total computational cost 

= cost of computing the front-end output -|- cost of back-end peeling 
= 0{k log k) X D X d + cost of back-end peeling 

< 0{k log^ n) -|- cost of back-end peeling 

0{k log^ n) + 0{k log^/^ n) 

= 0{klog^ n), (40) 

where (a) follows from proposition VIII. 1 and the fact that each bin is processed only constant number 
of times. ■ 


Appendix D 

Threshold based energy-detector 

In this section we provide a proof of the Lemma C.I. Let u £ be an arbitrary complex D 
dimensional vector and z £ CJ\f{0,lDxD)- Let 0 < 7 < ||tt|p/L>, be some constant. Then, using the tail 
bound for a non-central random variable (see equation (58b) in [26]) we get. 


Pr{\\u -h T||^ < (1 -h 'y)D) 

Pr{\\V2u + V2^\^ < 2D{l + -f)) 

DjWuW^/D 
2 + 4||u||2/^ 


< exp — 
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Appendix E 

Proof of proposition VI. 1 

Consider N noise-corrupted time-domain samples of a single complex sinusoid, obtained by sampling 
the signal at a periodic interval of 1, as follows, 


y{t) = + w{t), f = 0,1, 2,..., (iV - 1), 


(41) 


where the amplitude A, frequency cj, and the phase cj) are deterministic but unknown constants. The noise 
samples w{t) ~ CJ\f{0, 1). Then, as shown in (12) the MMSE estimate of the unknown frequency u is 
given by, 

where p = Then for any constant ci, there exists sufficiently large value of N such that, 

GIp 


vr 

Pr I leu — w > — 
Cl 


= Pr 


AA 0, 


iV(iV2 - 1) 


>-) 

Cl/ 




6/p 


= 2Q 


/7r2 iV(Ar2 _ 1) 


6/p 


7r2 N(N^ - 1) 
< 6/p 


< l/n^ 

where in the last inequality, we used N = (/(log^^^n). 


(43) 


Appendix E 

Proof of proposition VIII. 1 

Eet uji be an estimate of 2*tu obtained using the observations of cluster i as per estimation rule (11). 
Also let Qi := {oJi — -K/ci^Ui -|- vr/ci), i.e., 112*1 = 21:jcx. Define Ei an evenf fhaf ( 2 * 01 ) 27 * ^ 12*, where 
uj is fhe frue unknown frequency, of fhe form 27r^/n, corresponding fo fhe locafion ^ of a singlefon bin. 
Then, from proposition VI.l we have Pr(Ei) = Pr{Eo) < 1/n^ for N = n). Applying a union 
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bound over C clusters, we get, 


c-i 

i=0 

< C/n^ 

< l/n^ 


(44) 


where in the last inequality we used the value of (7 = O(logn). 
From (21) we know. 


27r 


27r 


Cl 


< 


n 


(45) 


for appropriate choice of the constants. Hence using (44), (45) and the fact that there are total of n possible 
frequencies 27r£/n,£ = 0 ,... ,n — 1, we conclude that the singleton-estimator algorithm identifies the 
true location of the non-zero coefficient of a singleton bin with probability at least 1 — 0(l/n^). 


A. Computational Complexity 

The computational complexity of processing samples of each cluster is = O(log^^^n). There are 
total of C = O(logn) clusters. Hence, the overall complexity of the singleton estimator algorithm is 

0{NC) = 0{\og^/^n). 
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